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(57) ABSTRACT 

When a source program containing annotations is processed 
by a user-selected tool, the annotations in the source pro- 
gram are detected by a lexer and passed to an annotation 
processor corresponding to the selected tool. The system 
contains a number of annotation processors and a number of 
program processing tools, and the annotation processor to 
which the annotations are passed is selected based upon the 
user-selected tool. The selected annotation processor con- 
verts annotations compatible with the user-selected tool into 
annotation tokens and returns the annotation tokens to the 
lexer. The lexer generates tokens based upon the 
programming-language statements in the source program, 
and passes both the tokens and annotation tokens to a parser. 
The parser, in turn, assembles the tokens and annotation 
tokens into an abstract syntax tree, which is then passed to 
the user-selected tool for further processing. 

12 Claims, 7 Drawing Sheets 
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SYSTEM AND METHOD FOR LEXING AND language, extensive recoding of the standard programming- 

PARSING PROGRAM ANNOIATIONS language lexer and parser may be required to handle pro- 

gram annotations specific to that tool. Even a simple 
The present invention relates generally to compilers and modification made to the syntax of the annotations used by 
program analyzers, and more particularly to an improved 5 an existing tool may require extensive modification of the 
system and method for lexing and parsing computer pro- lexer and parser of that tool. 



grams that include tool-specific annotations. 

BACKGROUND OF THE INVENTION 



SUMMARY OF THE INVENTION 



In the system and methods of the present invention, 

A compiler or a source-level program analyzer is capable 10 tool-specific annotations are recognized by the lexer for the 

of parsing source programs, which are written in a particular programming-language, but the lexing and parsing of the 

programming -language. Compilers generally include a lexer tool-specific annotations are handled by a separate, tool- 

and a parser. Similarly, other types of programming tools specific annotation processor. 

include a lexer and parser. The lexer reads the source-level A compiler or other programming tool includes a lexer 
program and generates tokens based upon the programming- 15 capable of detecting computer programming-language units 
language statements in the source -level program. The lexer present in a character stream. The lexer generates a stream 
passes the generated tokens to the parser, which assembles of tokens based upon these units. The lexer is further capable 
the tokens into an abstract syntax tree (AST). The abstract of detecting the units of computer programming-language 
syntax tree is further processed by one or more tools, such statements such as identifiers. As the lexer detects tool- 
as a compiler back-end or a program correctness tester. 20 specific annotations in the character stream, it passes them 
Tool specific annotations are typically used in the source to the back-end annotation processor. The back-end anno- 
program to give the tools special instructions; for example, tation processor is designed to lex and parse the annotations 
"generate the following machine code instruction at this for ji specific tool (or set ofjools). In a system having a 
point in the target code," "generate code that uses a machine fpluwUty^ 
register for this program variable," "ignore possible errors of 25 /^bacjgeml^fj^ 
type x in this program statement," or "check that this ^ol^sperificannojtati^^ 

parameter is always a non-zero integer." As new tools are 'WBelfmTback-end annotation processor receives a tool- 
devised, and as new features are added to those tools, the specific annotation from the lexer, the annotation processor 
lexer and parser used by the tools will often require corre- generates an annotation token based upon the tool-specific 

sponding revisions. 30 annotation and returns the annotation token to the lexer. The 

yTrie^r^e^inventioF address^s^e~^blenfof levising ^ lexer in turn adds the annotation token to the end of a list of 
Ahe lexer and parser for a programmmg-l^guage.whe^jiew^ tokens it has generated so far. The lexer passes the mixed 
tools afrcreatedror rieW annotation-based featuresa re added ? stream of tokens, some generated within the lexer, and some 

/ to tools.-In-parucular71jsing^ generated by the back-end annotation processor, to the 

^"specific annotations are effectively separated from 35 parser. The parser assembles the stream of tokens and 

programming-language-specific statements. Further, the annotation tokens into an abstract syntax tree and passes the 

present invention makes it relatively simple to implement a tree to the aforementioned tool. The tool processes the 

wide range of tool-specific annotations, including annota- annotation tokens as well as the other tokens in the abstract 

lions that employ a complex programming-language. 4Q syntax tree. 

Two conventional approaches that allow tool-specific In a preferred embodiment, at least one of the annotation 

annotations are known. In a first approach, tool-specific processors has the capability of generating an annotation 

annotations are recognized and processed by the lexer. In a token that includes an abstract syntax tree within the anno- 

second approach, tool-specific annotations are recognized tation token. The abstract syntax tree within the annotation 

and processed by the parser. 45 token may be referred to as a secondary abstract syntax tree 

An example of the first conventional approach to sup- ««* the abstract syntax tree assembled by the parser may be 

porting tool-specific annotations is the way a "#line N" referred to as the primary abstract syntax tree. In this 

tool-specific annotation may be handled by a C compiler. embodiment, the annotation token including a secondary 

There, the C compiler lexer may keep track of the line abstract syntax tree is incorporated into the primary abstract 

number information of every token it recognizes. If the C 50 tree in a context-sensitive manner by the parser, 

compiler lexer reads the "#hne N" annotation, then the C In a preferred embodiment, an annotation processor 

compiler lexer changes an internal counter to N, as if the includes an annotation lexer and an annotation parser, 

next line were N, and proceeds to read the next token. Since Preferably, the annotation lexer is context-free and the 

the lexical structure of the "#line N" is so simple, a standard annotation parser is context-sensitive. 

lexer, such as the C compiler lexer, can recognize the 55 BRIEF DESCRIPTION OF THE DRAWINGS 
tool-specific annotation. 

An example of the second conventional approach to Additional objects and features of the invention will be 

supporting tool-specific annotations in a compiler is the way more readily apparent from the following detailed descrip- 

a compiler for the Modula-3 language handles an "<* 11011 and appended claims when taken in conjunction with 

ASSERT P *>" tool-specific annotation. It is treated as if it 60 ^ drawin g s > m which: 

were a Modula-3 program statement. Although "P" is an FIG. 1 is a flow chart of a method in accordance with the 

expression, it can be parsed appropriately because the anno- present invention. 

tation is recognized by the Modula-3 parser. FIG. 2 is a block diagram of a system in accordance with 

The conventional methods for recognizing tool-specific the present invention, 

annotations, while functional, are less than satisfactory in 65 FIGS. 3Aand 3C are examples of parse trees. FIG. 3B is 

practice. If a new tool (such as a type-checker or an the AST representation of FIG. 3 A and FIG. 3D is the AST 

error-checker) is created for a particular programming- representation of FIG. 3C. 
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FIG. 4Ais an example of an annotation token and FIGS. 
4B and 4C provide examples of how an annotation token is 
incorporated into an abstract syntax tree. 

FIGS. 5 A and 5B are examples of annotation classes. . 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Consider a system that takes as input a computer program 
and processes the program in some way. Examples of such 
systems are compilers, type checkers, lint-like program 
checkers, extended static checkers, program verifiers, pro- 
gram documentation systems, and the like. The input com- 
puter program may be written in a standard programming- 
language, but the programming tool may need additional 
information provided by the programmer and specific to the 
tool. 

Therefore, annotations can be defined for the tool. If the 
annotation language is defined as an extension of an existing 
programming-language, then one would want the implemen- 
tation of the programming toors parser to be defined as an 
extension of the parser for the standard programming- 
language, as much as possible. 

Lexer, Parser, Streams, and Tokens 

FIG. 1 shows the data flow 100 of an implementation of 
a method that converts an input program into a data structure 
that a tool can then manipulate and analyze. A source file 102 
is read by a lexer 104. Lexer 104 recognizes programming- 
language units in source file 102 and generates tokens 
representing these units. Lexer 104 also detects annotations 
in source file 102. When lexer 104 detects an annotation, it 
passes the annotation 106 to the back-end 120. In some 
implementations, the lexer 104 may recognize comments in 
the program, and may pass all comments to the back-end 
120. Since the lexer 104 for any programming-language 
would normally already be coded to recognize comments, 
this implementation minimizes changes to the lexer 104. The 
lexer 104, instead of ignoring (and effectively discarding) 
program comments, passes them to the back-end 120. 

Back-end 120 includes a plurality of annotation proces- 
sors 124 and a corresponding set of tools 122. In some 
implementations, if two or more of the tools use identical 
sets of annotations, it is possible that one of the annotation 
processors will correspond to two or more of the tools. 

Generally, the process of lexing and parsing a program is 
initiated when a user of the system requests a user-specified 
tool to process a user-specified program. All the tools 120 in 
the back-end are assumed, for the purposes of this 
explanation, to be tools used to process computer programs 
in one particular programming-language, such as C or C++ 
or Java (trademark of Sun Microsystems, Inc.). In accor- 
dance with the present invention, and unlike prior systems, 
the lexer 104 and parser 130 are generic to the 
programming-language for the specified program. That is, 
the same lexer and parser are used with all the tools 122. All 
tool-specific annotation handling is performed by the par- 
ticular back-end annotation processor 124 that corresponds 
to the user-specified tool. 

There are some circumstances in which a user may 
specify that a particular program is to be processed by two 
tools. For instance, the user may specify that a particular 
program is to be first processed by a compiler, and then if no 
errors are detected by the compiler, the program is to be 
processed by a documentation generator. Similarly, in some 
circumstances a first tool in the back-end may automatically 
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invoke another tool in the back-end to process the AST 
either before or after the first tool. In these circumstances, 
the front-end, with the lexer and parser, only converts the 
program into an AST once, and thus only passes the anno- 

s tations to one particular annotation processor. In such 
circumstances, it is assumed that either (A) both tools use 
the same annotations, or (B) that one or both tools include 
facflities for ignoring annotation tokens in the AST not 
applicable to that tool. 

10 When back-end 120 receives annotation 106 from lexer 
104, the annotation is sent to the annotation processor 124 
for the user-specified tool. After processing the annotation 
into an annotation token, annotation processor 124 returns 
the annotation token 126 to lexer 104. If the annotation was 

15 a simple comment, including no applicable instructions for 
the user-specified tools 122, annotation processor 124 
returns a NULL value to lexer 104. Lexer passes a stream of 
tokens, generated by lexer 104, and annotation tokens, 
generated by annotation processor 124, to parser 130. Parser 

20 130 assembles the token and annotation token stream into an 
abstract syntax tree (AST) 132 according to the grammar of 
the programming-language in which the input source file 
102 is written. In a preferred embodiment, the 
programming-language grammar used to write the input 

25 source file 102 is extended to include context-sensitive 
annotation slots. For example, a programming-language that 
includes the statement 

S::=Var X in S end 

30 

may be extended to 

S::-Var X annotation in S end 

where annotation represents a slot where an annotation such 
35 as "Frequently Used" may be placed. Parser 130 passes AST 
132 to back-end 120. The user-specified tool or tools 122 
process the AST 132, including the annotation tokens 
therein, to produce target file 140. 
FIG. 2 shows a system, such as system 200, in accordance 
40 with the present invention. The system preferably includes: 
a user interface 202, including a display 204 and one or 

more input devices 206; 
one or more central processing units 210; 
45 a main non-volatile storage unit 212, preferably a hard 
disk drive, for storing source files (102 FIG. 1) and 
target files (140 FIG. 1); 
a system memory unit 214, preferably including both high 
speed random-access memory (RAM) and read-only 
50 memory (ROM), for storing system control programs 
and application programs loaded from disk 212; and 
one or more internal buses 216 for interconnecting the 

aforementioned elements of the system. 
The operation of system 200 is controlled primarily by 
55 control programs that are executed by the system's data 
processor 210. The system's control programs may be stored 
in system memory 214. In a typical implementation, the 
programs stored in the system memory 214 will include: 
an operating system 220; 
60 a file handling system 222; 

one or more application programs 224; 
a lexer 226; 
a parser 228; and 
65 a back-end 230. 

Back-end 230 includes one or more tools 232 and corre- 
sponding annotation processors 234. Each tool 232 pro- 
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cesses specific components of an abstract syntax tree passed 
to back-end 230 by parser 228. As an illustration, a first tool 
232 in back-end 230 may be a type checker, a second tool 
232 may be an extended static checker, a third tool 232 may 
be a program verifier, and so forth. Conceptually, each of 5 
these tools has a corresponding annotation processor 234, 
but as indicated above, some tools may share a common 
annotation processor. In a preferred embodiment, at least 
one of the annotation processors 234 includes an annotation 
lexer 236 and an annotation parser 238. 10 

As an example of how a source file is processed by a lexer 
and a parser to form an AST, consider a hypothetical 
programming-language with the following grammar: 



15 



20 



25 



30 



Program :: 


= ExprA EOS 


; where EOS denotes end-of -stream 


ExprA 


::= ExprB ExprA 


^addition, or, 




| ExprB ExpA 


;subraction 




| ExprB 




ExprB 


Variable 


jvariable value, or 




| Number 


;numeric value, or 




j ExprB 


;unary minus, or 




| T ExprA T 


;parenthetical expression. 



The tokens for this hypothetical language may be: 



NUMBER(n) 


;where "n" denotes a non-negative integer 


IDENTIFIER^) 


; where "s" denotes a name 


PLUS 


; addition operator ("+") 


MINUS 


; subtraction operator ("-") 


OPEN_PAREN 


;open parenthetical expression 


CLOSE _PAREN: 


; close parenthetical expression, and 


END__OF_STREAM 


;end of input stream 



Every token in the hypothetical language has a label such 
as "NUMBER" or "PLUS," and some tokens also have a 
parameter value, such as an integer (n) or a string (s). Now 
consider a particular one line source file 102 (FIG. 1) written 
in the hypothetical language: 

size +13 

The stream of characters corresponding to this one line 
source file is: 



35 



M s" T "z" V +" 



"1" "3" EOS 



45 



Referring to FIG. 1, lexer 104 converts this stream of 
characters into the following sequence of tokens: 

IDENTIFIER ("size") PLUS NUMBER(13)END_OF_STREAM 50 

The parser 130 (FIG. 1) then conceptually generates parse 
tree 300 (shown in FIG. 3A) from these tokens. In practice, 
the parser actually generates the AST data structure 132 
shown in FIG. 3B. In a preferred embodiment AST 132 does ^ 
not contain unneeded and redundant information present in 
parse tree 300. In FIG. 3B, AST includes a program node 
302, node 304 for the addition expression, a node 306 for the 
variable "size", and a node 308 for the number "13." 

As another example, consider a second one line source file 
written in the hypothetical language described above: 

-x-5 



The sequence of tokens corresponding to this second one 
line source file is: 

MINUS IDENTTFIER("x") MINUS NUMBER (5) END_OF_ 
STREAM, 



65 



which is converted by parser 130 (FIG. 1) into parse tree 350 
and AST 132 shown in FIGS. 3C and 3D, respectively. In 
this example, lexer 104 converts any occurrence of the 
character "-" into the token MINUS, however, the parser 
130 may interpret MINUS either as a unary minus operator 
negation or as the binary subtraction operator depending on 
which tokens precede or follow the MINUS token. In this 
sense, parser 130 is context-sensitive whereas lexer 104 is 
context-free. 

Comments and Whitespace 

Note that in the two examples above, lexer 104 does not 
produce any tokens for the " " (whitespace) characters of 
source file 102. Most modem programming-languages are 
designed in that way. In fact, most languages also allow the 
program to include "comments" that the programmer writes 
to document the source program. A standard lexer 104 also 
ignores comments, and therefore, comments are never pro- 
cessed by parser 130. This has the advantage that a pro- 
grammer can include whitespace and comments anywhere in 
the program, as long as a comment or whitespace is not 
inserted inside consecutive characters that make up a token. 
It also means that the grammar for a programming-language 
need not specify all places where comments or whitespace 
can be placed. 

Comments are usually delimited by a sequence of char- 
acters that begin the comment, and a sequence of characters 
that end the comment. For example, in the programming- 
languages C++ and Java, a comment can begin with the 
characters "/*" and end with the characters "*/". 

Thus, if a lexer for Java detects the character "/*" fol- 
lowed by the character "*" in a source file, the lexer 
(assuming that it does not incorporate the present invention) 
will ignore all following characters up until the next con- 
secutive occurrence of the characters "*" and "/". 

(AnnotaSohs/ 

Referring to FIG. 1, an notations aroused by tools 122. 
£Each tool 122 may have a set of annotations that it recogf 
^nizes and supports. The annotations are placed in the source^/ s 
>file along with the programming-language statements. A 
standard~lexer7 one "that does nof incorporate-the-present 
invention, treats the annotations as comments and does not 
process them. However, in accordance with the present 
invention the lexer 104 is modified to either (A) send all 
comments to an annotation processor 124 for processing, or 
(B) recognize the beginning of a string in a comment that 
appears to represent an annotation, and pass that string to the 
annotation processor 124. 

As indicated earlier, each tool 122 (FIG. 1) in the back- 
end 120 may use a different set of annotations than the other 
tools. If a program (source file) contains annotations for use 
with more than one tool, it may contain annotations not 
recognized by the user-specified tool. Stated in another way, 
each tool 122 only processes the annotations that belong to 
the set of annotations recognized by the particular tool 122. 
Each tool 122 is preferably coded to ignore annotations in 
the AST that are not supported by the tool. Furthermore, 
each annotation processor 124 is preferably coded to return 
NULL values to the lexer 104 for annotations that are not 
supported by the corresponding tool 122. 

As a simple way to define which comments are to be 
mterpreted^aTanlootations, the^aunotation language'of a tool^ 
122~may say that any comment whose fir st ch a racter is the / 
/character "@" is an annotation. Thus, for example, the input 
prograjri'ffapfent: J 
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7* this is a comment V 



would be considered a comment that is ignored whereas the 
input program fragment: 

7'@ this is an annotation */" 

would be considered an annotation. One of skill in the art 
will recognize that there, are many schemes, in addition to 
the /*@*/ example described above, for distinguishing 
annotations, which are processed by an annotation 
processor, and comments, which are simply ignored. 

The following are examples of annotations that are useful 
in particular tools 122: 

/*@ NON_NULL*/ 

/*@ FREQUENTLY_USED */ 

l*@ EVEN */ 

l*@ INVARIANT x<y+10V 

/*@ This is a comment used in some special way by a 
program documentation system */ 

/*@ DEPENDS a[t: T] ON c[t] */ 

In most systems, each tool that uses annotations has a 
custom designed lexer and parser that are used only with that 
tool. As discussed above, in the present invention the lexer 
104 and parser 130 are generic and are used with all the tools 
(Or at least a set of several tools) for processing programs 
written in a particular programming-language. When new 
tools are developed, or new types of annotations are devel- 
oped for an existing tool, the lexer 104 and parser 130 
remain unchanged, since annotation lexing and parsing has 
been compartmentalized and delegated to the annotation 
processors 124. 

Annotation Tokens 

Referring to FIG. 1, the present invention introduces the 
concept of a "annotation token." An annotation token is like 
a token in that it has a label and can be passed by lexer 104 
to parser 130. An annotation token is distinguished from 
other tokens in that its parameter value is not only capable 
of being a simple integer or string, but also a more complex 
structure, for example, an abstract syntax tree. 

Furthermore, the structure of the annotation token is not 
denned by the lexer ,104 or parser 130, but rather by a 
specific tool 122. That is, the lexer 104 never "looks inside" 
an annotation token, and is not dependent upon the internal 
structure of the annotation token. This lets the lexer 104 
remain independent of the tools 122. 

Generating Annotation Tokens 

As mentioned above, when lexer 104 detects an annota- 
tion (or a comment that might contain an annotation), it 
passes the annotation to an annotation processor 124. Anno- 
tation processor 124 takes annotations 106 as input and 
returns annotation tokens 126 to the lexer. Lexer 104 passes 
the annotation tokens received from annotation processor 
124 to parser 130. 

FIG. 2 shows the details of one embodiment of an 
annotation processor 234 (124 FIG. 1). An annotation lexer 
236 receives an annotation from the lexer 226. The anno- 
tation lexer determines the lexical content of the annotation 
and passes one or more tokens to annotation parser 238. 
Annotation parser 238 generates an annotation token based 
upon the tokens passed to it by the annotation lexer 236. This 
annotation token is then returned to lexer 226. 

Note, with this structure, lexer 226 (104 FIG. 1) does not 
need to know all possible annotation tokens that can be 
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returned by annotation processor 234.. Lexer 226 simply 
passes the annotation tokens to parser 228 as it would any 
other token. 

In some embodiments, the annotation processor 124 for 
some tools may have a combined lexer and parser. This 
combined lexer/parser is preferred when all annotations 
denned for the tool are extremely simple in structure, each 
annotation typically consisting of a label or a label and one 
or two parameters. For more complex annotations, the 
separate lexer and parser arrangement shown in FIG. 2 is 
preferred. 

The present invention works most cleanly when the 
annotation processor 124 is context-free. That is, the anno- 
tation processor 124 produces annotation tokens according 
to the given annotation 106, without regard to where in the 
source file 102 the annotations 106 occur. 

Although the annotation processor 124 is preferably 
context-free, the context of the annotation in source file 102 
may have meaning because the context of the annotation in 
source file 102 will affect how an annotation token 126, 
corresponding to the annotation 106, is assembled into AST 
132 by parser 130 and processed by tools 122. Put more 
simply, the position of each annotation token in the sequence 
of tokens sent by the lexer 104 to the parser 130 will provide 
context information for the annotation. 

As a simple example, consider a programming-language 
whose syntax is given by the following grammar: 



30 



35 



Program 


::= Statements EOS 


Statements 


Statement Statements 




| Statement 


Statement 


"VAR" Variable "IN" Statements "END" 




| Variable Expr ;variable assignment 


Expr 


::» Number ;numeric value, or 


| Variable ;variable value 




| Expr Expr ;addition 



Now, consider a simple example in which a tool 122 
40 allows a variable declaration to be annotated to, for example, 
indicate that the variable declared is frequently used, and 
that allows an assignment to be annotated to indicate that the 
numeric value assigned to the variable is even or will be 
frequently used in the rest of the program, or both even and 
45 frequently used. 

To allow the use of such annotations, the portion of the 
grammar for the programming-language for defining a 
Statement, where G* denotes any number of occurrences of 
G's (including none), is modified to read as follows: 



Statement 



Annotation 



"VAR" Annotation' Variable "IN" Statements "END" 
| Annotation* Variable Expr 
FrequentlyUsed | Even 



where FrequentlyUsed and Even denote the respective anno- 
tations. 

Using our invention, the precise grammar for annotations 
60 is known only to the tool-specific annotation processor 234 
(FIG. 2); the non-tool specific lexer 226 and parser 228 treat 
Annotation as denoting any annotation (token). We allow 
multiple annotations for a given statement to allow a vari- 
able to be declared both frequently used and even. It is the 
65 job of the tool 122 (FIG. 1) to disallow the use of an Even 
annotation on a variable declaration. Note that this factoring 
of the grammar allows us to change the set of legal anno- 
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This annotation is returned by the annotation processor 
124 to lexer 104, and lexer 104 passes it on to parser 130. 

After lexer 104 reaches the second FREQUENTLY_ 
USED annotation, lexer 104 will pass a substream also 
consisting of the characters above to the annotation proces- 
sor 124, which will again return: 

ANNOTATI ON_TOKEN (FR EQ U ENTLY USED). 

It is noted that the same annotation token is returned, even 
though the parser 130 will use the annotation token in 
different ways in these two cases. The first 
FREQUENTLY_USED annotation applies to a variable 
that is declared, whereas the second annotation applies to the 
result of an assignment statement Accordingly, the two 
FREQUENTLY_USED annotations are assembled by the 
parser 130 into the AST tree in a context-sensitive manner 
and processed by a tool that supports the FREQUENTLY- 
USED annotation. 



10 



15 



tations later without changing the non-tool specific lexer 226 
and parser 228 so long as the new annotations can only 
appear in the same places as the old annotations. 

In accordance with the programming-language grammar 
specified above, the tokens that can be returned by the lexer 
are: 

NUMBER(n) 
IDENTIFIER(s) 
PLUS 
VAR 
IN 
END 

BECOMES ;the token for":-" 

SEMICOLON 

END__OF_STREAM, 
and the forms of annotation tokens that the annotation 
processor (and thus also the lexer) can return are: 20 

ANNOTATION_TOKEN(FREQUENTLY__USED) 
ANNOTATION_TOKEN(EVEN) 
Thus, consider the following annotated program that is 
written in the programming-language defined above: M 
VAR x IN 
x=5; 

VAR l*@ FREQUENTLY_USED */ y IN 
y :»x+3; 

/•©EVEN*/ y :=y+2; 30 
l*@ FREQUENTLY__USED */ x . 
y :-y+x+y+x+y+x 
END 
END 

The lexer 104, after returning the second VAR token and 35 
upon recognizing the characters "/*@", will generate a 
substream consisting of the following characters: 

«F» «r» «E" "CT "U " "E" "NT T' "L" "V 
"IT M S" "E" "D EOS 

This substream is sent by the lexer 104 to the annotation 
processor 124. The annotation processor 124 will then 
produce the following annotation token: 45 

ANNOTATION_.TOKEN(FR EQU ENTLY USED) 



In the example above, annotation processor 124 is quite 
simple. In general, however, annotation processor 124 may 
construct more complex annotation tokens. For example, to 
create an annotation token for the INVARIANT annotation: 

f*@ INVARIANT x<y+10*/ 

the annotation processor must parse the expression that 
follows the keyword INVARIANT to generate the annota- 
tion token depicted in FIG. 4A. More specifically, for tools 
using complex annotations of this type, the annotation 
processor will preferably include a lexer 236 (FIG. 2) that 
converts the annotation text into a sequence of tokens, and 
a parser 238 that assembles the tokens into an abstract syntax 
tree in accordance with the grammar of the "annotation 
language'* for the tool. 

Even a complex annotation token such as that depicted in 
FIG. 4Ais not processed by lexer 104 (FIG. 1) or parser 130. 
Rather, parser 130 assembles the annotation token into an 
AST without "looking inside" the annotation token. Then, 
when the parser passes the AST to the back-end, a tool 
capable of processing the INVARIANT annotation analyzes 
the token depicted in FIG. 4A. 

Tools may support annotations that are highly complex 
mathematical formulas. For example, a tool may support the 
annotation: 

/*@ x=quad(a,b,c) */. 

In such an example, lexer 104, noting the /*@ */ structure, 
will pass the annotation to an annotation processor 124. An 
annotation processor that supports the quadratic function 
will then generate an abstract syntax tree in accordance with 
the quadratic equation: 



la " 



For tools that utilize complex annotation tokens, preferred 
embodiments of the annotation processor include an anno- 
tation lexer and an annotation parser. 

EXAMPLES 

The advantage of the system and methods of the present 
convention can further be illustrated by considering the 
following examples. 

Example 1 

Consider the hypothetical programming-language: 



50 



55 



60 



S :> 



e I S P 

6 

"var"X 
X"-"E 

"if" E "then" Sj "else" S 2 
X 

E L E 2 
E, "-- E 2 
E, E 2 
E 1 Es 



where e represents a null element and X represents a variable 
name such as x. Now consider the following two-fine 
program written in the hypothetical programming-language: 



65 
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A parser will build the AST depicted in FIG. 4B. Now, 
suppose that we desire to change the programming-language -continued 
to support context-sensitive annotations such as: " — ™ 

var x annotation; and " | e 1 4 v* E 2 

annotation: 5 E i E 2 

E 41 V E 

These annotations are context-sensitive in the sense that j E j M .„ £ * 

in the first case they apply to the variable immediately ' I 

preceding them and in the second case they act as a new kind 

of statement. Such annotations may be used to convey Here, we have divided up the set of possible annotations 
special meaning to a back-end tool such as an error-checker. 10 mto mose ^at ma y te attached to variable declarations 
For example, the annotation "f*@ non_null */" when (DeclAnnotations) and those can may be used like state- 
placed in a variable declaration might mean that the declared ments (StatementAnnotaUons). 

variable should never be assigned a null value, and the To handle this, annotation tokens now contain infonna- 
annotation «/*@ assert x>OT placed where a statement < ion ab ™ 1 what k | nc L the y are pcclAnnotation or 
could go might instruct the error checker to make sure that 15 StatementAnnotafaon). The non-tool specific lexer 104 
when the program reaches that point in the program that x ™ rks 33 b * fo ?7 Ttenon-tool specific parser 130 is modi- 
is greater than 0 " *° use information when generating parse trees. 

It ignores all other information in annotation tokens. This 
means that we may change the set of annotations in any way 

20 without changing the non-tool-specific lexer 226 or parser 

p c | s v P 228 so long as every annotation can appear only either where 
S « DeclAnnotation appears in the grammar or where Statemen- 
^ar^Annotation*- tAnnotation appears in the grammar. For most programming 
j x «=" e languages, given a reasonable choice of grammar slots, this 
| "if* E "then" s, "else" s 2 25 limitation is seldom an issue. Putting information about the 
E :: ° x m m kinds of annotations into the grammar also has the advan- 
g 1 1\ ' tage of making the parser's job easier because it may need 
j E J « + » ^ to do less lookahead to determine what it is seeing; the kind 
| Ej information may also enable the parser to produce better 
30 error messages. 

The two-line program written in the hypothetical Example 3 

programming-language may then read: Thc system and method of ^ prescnt invcntion is par . 

var x y - annotation; ticularly advantageous when annotations are represented as 

x=x+x,; 35 objected-oriented classes. Referring to FIG. 5 A, consider a 

When the lexer detects the annotation, it will send it to the particular tool that represents annotations as subclasses 510 

appropriate annotation processor. The annotation processor of a class named BASE__CLASS 500. Because subclasses 

will generate an annotation token and return the annotation can be used anywhere a superclass is required, this means 

token to the lexer. The lexer will pass the annotation token that the lexer 104 and parser 130 need deal with annotation 

along with the tokens generated by the lexer to the parser. 40 tokens only of type BASE_CLASS. Moreover, if new 

The parser will then generate the AST depicted in FIG. 4C. annotations are added later that require new classes, we can 

The annotation token assembled into the AST by the parser avoid having to make any change to the original lexer and 

will not be processed until the AST is passed to the appro- parser by making the new classes subclasses of BASE_ 

priate tool. Thus, the lexer need only be receded to the extent CLASS. 

that it distinguishes "annotations" from comments in order 45 Referring to FIG. 5B, in some embodiments, the original 

to support the newly modified hypothetical language. lexer may be designed to recognize multiple kinds of 

annotations (e.g., example 2). In this case, it is most advan- 

Example 2 tageous to have a separate base class 550 for each kind of 

. r . . ■ Cl , annotation. Thus, for example, all annotations of variable 

n practice, each type of annotation often makes sense declafations ^ be subclasses of B ASE_CLASS 1 and 

only when placed in certain annotation slots of the modified ^ statement M ^ might be subclasses of 

programming-language grammar. For example the non^ B ASE_CLASS 2. Here, the lexer 104 and parser 130 need 

null annotation is meaning ess when used as a statement and ^ ^ annoUtion tokens onl of t B ASE_CLASS 1 

since assert annotations act like statements, it doesn t make ^ BASE CUkSS 2 ^ means that {h do not need to 

much sense to allow them to be attached to variable decla- . , A _ ng>m n ^,u^, cce . c ' *aa~a ^ *w„ 

„.,.. _ . ^rri « i • u * 55 be changed as new annotation subclasses are added to the 

rations. Whrie tool 122 can scan an AST and complain about JJ ^ c i asses 

ill-placed annotations, it is easier, in such cases, to put the . 

. / . , # ' . „ #t „ . The foregoing description has been directed to specific 

information about where annotations can occur directly in , e ® . . r . _ Mlt . r 

. , . c -pi i embodiments of this invention. It will be apparent, however, 

the non-tool specific grammar. For example: _ A . . , ... . v j . ,u 

that variations and modifications may be made to the 

60 described embodiments, with the attainment of all or some 

of the advantages. Therefore, it is the object of the appended 

p ::= c | s v P claims to cover all such variations and modifications as 

s ::- € come within the spirit and scope of the invention. 

IStatement^notaUon What is claimed is: 

"var X DeclAnnotation*; . . A _ A . , 

j x « = » E 65 1. A system for parsing annotations in a character stream 

| v E "then" S! "else" s 2 representing a source program written in a computer 

programming-language, comprising: 
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a central processing unit; 
a memory; 

at least one bus connecting the central processing unit and 
the memory; 

the memory storing a lexer, a back-end, and a parser 
the lexer for generating tokens based upon the 
programming-language statements present in the 
character stream, detecting the annotations of the 
character stream, passing the detected annotations to 
the back-end, receiving annotation tokens from the 
back-end, and passing both the tokens and the anno- 
tation tokens to the parser; 
the back-end including a plurality of annotation 
processors, for processing annotations received from 
the lexer into annotation tokens, and a plurality of 
tools, for processing a primary abstract syntax tree 
received from the parser; wherein, for each annota- 
tion processor in the plurality of annotation 
processors, there is a corresponding tool in the 
plurality of tools; and 
the parser for receiving tokens and annotation tokens 
from the lexer and assembling the tokens and the 
annotation tokens into the primary abstract syntax 
tree, and for passing the assembled primary abstract 
syntax tree to the back-end. 

2. The system of claim 1 wherein 
the annotations are context-sensitive, and 
the annotation processors process the annotations into 

annotation tokens without respect to context of the 
annotations. 

3. The system of claim 1, wherein at least one of the 
annotation tokens includes a secondary abstract syntax tree 
and the parser is further configured to assemble the second- 
ary abstract syntax tree into the primary abstract syntax tree. 

4. The system of claim 1, wherein the annotation proces- 
sor includes an annotation lexer that is context-free, and an 
annotation parser that is context-sensitive. 

5. A method for parsing annotations in a character stream 
representing a source program written in a computer 40 
programming-language, comprising: 

in a lexer for a predefined computer programming- 
language, 

converting computer programming-language state- 
ments present in the character stream into tokens; 

detecting annotations in the character stream; 

passing the annotations to an annotation processor that 
is selected from a plurality of annotation processors; 

receiving annotation tokens from the annotation pro- 
cessor 

passing the tokens and annotation tokens to a parser; 

in the selected annotation processor, converting the anno- 
tations into the annotation tokens; 

in the parser, assembling the tokens and annotation tokens 
into a primary abstract syntax tree; 

passing the primary abstract syntax tree to a tool, selected 
from a plurality of tools, the selected tool correspond- 
ing to the selected annotation processor; and 
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in the selected tool, processing the primary abstract syn- 
tax tree. 

6. The method of claim 5, wherein 

the annotations are context-sensitive; and 

the selected annotation processor processes the annota- 
tions into annotation tokens without respect to context 
of the annotations. 

7. The method of claim 5, wherein at least one of the 
annotation tokens includes a secondary abstract syntax tree 
and the parser assembles the secondary abstract syntax tree 
into the primary abstract syntax tree. 

8. The method of claim 5 wherein the annotation proces- 
sor includes an annotation lexer that is context-free, and an 
annotation parser that is context-sensitive. 

9. A computer program product for use in conjunction 
with a computer controlled system, the computer program 
product comprising a computer readable storage medium 
and a computer program mechanism embedded therein, the 
computer program mechanism comprising: 

a lexer for generating tokens based upon the 
programming-language statements present in the char- 
acter stream, detecting the annotations of the character 
stream, passing the detected annotations to a back-end, 
receiving annotation tokens from the back-end, and 
passing both the tokens and the annotation tokens to a 
parser; 

the back-end including a plurality of annotation 
processors, for processing annotations received from 
the lexer into annotation tokens, and a plurality of tools, 
for processing a primary abstract syntax tree received 
from the parser; wherein, for each annotation processor 
in the plurality of annotation processors, there is a 
corresponding tool in the plurality of tools; 

the parser for receiving tokens and annotation tokens from 
the lexer and assembling the tokens and the annotation 
tokens into the primary abstract syntax tree, and for 
passing the assembled primary abstract syntax tree to 
the back-end. 

10. The computer program product of claim 9 wherein: 
the annotations are context-sensitive, and 

the annotation processors process the annotations into 
annotation tokens without respect to context of the 
annotations. 

11. The computer program product of claim 9, wherein at 
least one of the annotation tokens includes a secondary 
abstract syntax tree and the parser is further configured to 
assemble the secondary abstract syntax tree into the primary 
abstract syntax tree. 

12. The computer program product of claim 9, wherein 
the annotation processor includes an annotation lexer which 
is context-free, and an annotation parser which is context- 
sensitive. 
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